Exploiting Shared Memory to Improve Parallel I/O Performance
نویسندگان
چکیده
We explore several methods utilizing system-wide shared memory to improve the performance of MPI-IO, particularly for noncontiguous file access. We introduce an abstraction called the datatype iterator that permits efficient, dynamic generation of (offset, length) pairs for a given MPI derived datatype. Combining datatype iterators with overlapped I/O and computation, we demonstrate how a shared memory MPI implementation can utilize more than 90% of the available disk bandwidth (in some cases representing a 5× performance improvement over existing methods) even for extreme cases of non-contiguous datatypes. We generalize our results to suggest possible parallel I/O performance improvements on systems without global shared memory.
منابع مشابه
Specification and Performance Evaluation of Parallel I/O Interfaces for OpenMP
One of the most severe performance limitations of parallel applications today stems from the performance of I/O operations. Numerous projects have shown, that parallel I/O in combination with parallel file systems can significantly improve the performance of I/O operations. However, as of today there is no support for parallel I/O operations for applications using shared-memory programming mode...
متن کاملDevelopment and Evaluation of High-Performance Decorrelation Algorithms for the Nonalternating 3D Wavelet Transform
We introduce and evaluate the implementations of three parallel video-sequences decorrelation algorithms. The proposed algorithms are based on the nonalternating classic three-dimensional wavelet transform (3D-WT). The parallel implementations of the algorithms are developed and tested on a shared memory system, an SGI origin 3800 supercomputer making use of a messagepassing paradigm. We evalua...
متن کاملEfficient Machine-Independent Programming of High-Performance Multiprocessors
A major component of the success of scientific computing is the rapid increase in computing capability. Parallel computing can provide the next great leap in the computation power scientists and engineers need to solve many important problems. The proliferation of parallel architectures, however, discourages users from writing parallel applications. Recent advances in automatic parallelization ...
متن کاملA comprehensive distributed shared memory system that is easy to use and program
An analysis of the distributed shared memory (DSM) work carried out by other researchers shows that it has been able to improve the performance of applications, at the expense of ease of programming and use. Many implementations require application programmers to write code to explicitly associate shared variables with synchronization variables or to label the variables according to their acces...
متن کاملcient Parallelization of Unstructured Reductions on Shared Memory Parallel Architectures ?
This paper presents a new parallelization method for an efcient implementation of unstructured array reductions on shared memory parallel machines with OpenMP. This method is strongly related to parallelization techniques for irregular reductions on distributed memory machines as employed in the context of High Performance Fortran. By exploiting data locality, synchronization is minimized witho...
متن کامل